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Richard L. Graham, Sung-Eun Choi, David J. Daniel, Nehal N. Desai, Ronald G. Minnich, Craig 
E. Rasmussen, L. Dean RIslnger, Mitchel W. Sukalski 

June 2002 Proceedings of the 16th international conference on Supercomputing 
Publisher: ACIVI Press 

Additional Infomiation: full citation, abstract , references , citings, index 
terms 



Full text available: gpdf(148.66 KB) 



The Los Alamos Message Passing Interface (LA-MPI) is an end-to-end network-failure- 
tolerant message-passing system designed for terascale clusters. LA-MPI is a standard- 
compliant implementation of MPI designed to tolerate network-related failures including 
I/O bus errors, network card errors, and wire-transmission errors. This paper details the 
distinguishing features of LA-MPI, including support for concurrent use of multiple types of 
network interface, and reliable message transmission utilizi ... 

Keywords: MPI, fault tolerance, message passing 



^ Improving cluster availability using workstation validation 
Taliver Heath, Richard P. Martin, Thu D. Nguyen 

June 2002 ACi^ SIGMETRICS Performance Evaluation Review , Proceedings of the 
2002 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '02, Volume 30 issue i 
Publisher: ACM Press 

Full text available: ^ pdf(201 .72 KB) Additional Information: full citation , abstract , references 

We demonstrate a framework for improving the availability of cluster based Internet 
services. Our approach models Internet services as a collection of interconnected 
components, each possessing well defined interfaces and failure semantics. Such a 
decomposition allows designers to engineer high availability based on an understanding of 
the interconnections and isolated fault behavior of each component, as opposed to ad-hoc 
methods. In this work, we focus on using the entire commodity workstation ... 



Pursuing failure: the distribution of program failures in a profile space 

William Dickinson, David Leon, Andy Podgurski 

September 2001 ACM SIGSOFT Software Engineering Notes , Proceedings of the 8th 
European software engineering conference held jointly with 9th ACM 
SIGSOFT international symposium on Foundations of software 
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engineering ESEC/FSE-9, Volume 26 issue 5 
Publisher: ACM Press 

Full text available- ® Ddft304 58 KB) Additional Information: full citation , abstract , references , citings , index 
^ tenms 

Observation-based testing calls for analyzing profiles of executions induced by potential 
test cases, in order to select a subset of executions to be cliecked for conformance to 
requirennents. A family of techniques for selecting such a subset is evaluated 
experimentally. These techniques employ automatic cluster analysis to partition 
executions, and they use various sampling techniques to select executions from clusters. 
The experimental results support the hypothesis that with appropriate profil ... 

Keywords: adaptive sampling, cluster analysis, cluster filtering, failure-pursuit sampling, 
multivariate data analysis, observation-based testing, software testing 



4 Technical papers: consistency management and quality assurance: Automated Q 
support for classifyin g software failure reports 

Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, Bin 
Wang 

May 2003 Proceedings of the 25th International Conference on Software 

Engineering 
Publisher: IEEE Computer Society 

Full text available: ^ ^^^^^ ^ Additional Information: full citation , abstract , references , citings , index 

Publisher Site tois 

This paper proposes automated support for classifying reported software failures in order 
to facilitate prioritizing them and diagnosing their causes. A classification strategy is 
presented that involves the use of supervised and unsupervised pattern classification and 
multivariate visualization. These techniques are applied to profiles of failed executions in 
order to group together failures with the same or similar causes. The resulting 
classification is then used to assess the frequency and s ... 



5 Analysis and implementation of software rejuvenation in cluster systems Q 
Kalyanaraman Vaidyanathan, Richard E. Harper, Steven W. Hunter, Kishor S. Trivedi 
June 2001 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 

2001 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '01, Volume 29 issue i 
Publisher: ACM Press 

Full text available: pdf(983.05 KB) Additional Information: full citation , abstract , references , citings 

Several recent studies have reported the phenomenon of "software aging", one in which 
the state of a software system degrades with time. This may eventually lead to 
performance degradation of the software or crash/hang failure or both. "Software 
rejuvenation" is a pro-active technique aimed to prevent unexpected or unplanned outages 
due to aging. The basic idea is to stop the running software, clean its internal state and 
restart It. In this paper, we discuss software rejuvenation as applied to ... 

6 Manageabilitv. availability, and performance in porcupine: a highly scalable, cluster- Q 
based mail service 

Yasushi Salto, Brian N. Bershad, Henry M. Levy 

August 2000 ACM Transactions on Computer Systems (TOCS), Volume 18 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(2.52 IVIB) Additional Information: full citation , abstract , references , index terms 

This paper describes the motivation, design and performance of Porcupine, a scalable mail 
server. The goal of Porcupine Is to provide a highly available and scalable electronic mail 





http://portal.acm.org/results.cfrn?coll=ACM&dl=ACM&CFID=61690139&^ 11/29/05 



Results (page 1): "mark kampe" cluster failure 



Page 3 of 6 



service using a large cluster of commodity PCs. We designed Porcupine to be easy to 
manage by emphasizing dynamic load balancing, automatic configuration, and graceful 
degradation in the presence of failures. Key to the system's manageability, availability, 
and performance is that sessions, data, and underlying ... 

Keywords: cluster, distributed systems, email, group membership protocol, load 
balancing, replication 



A failure and overload tolerance mechanism for continuous media servers 
Rajesh Krishnan, Dinesh Venkatesh, Thomas D. C. Little 

November 1997 Proceedings of the fifth ACM international conference on Multimedia 
Publisher: ACM Press 

Full text available: fiQ pdff2.23 MB) Additional Information: full citation, references, index terms 



Keywords: caching, clustered video servers, content insertion, fault tolerance, interactive 
video-on-demand, overload tolerance, rate adaptive stream merging, stream clustering 



® Method for distributed transaction commit and recovery usin g Byzantine Agreement Q 
^ within clusters of processors 
^ C. Mohan, R. Strong, S. Finkelstein 

July 1985 ACM SIGOPS Operating Systems Review, Volume 19 Issue 3 

Publisher: ACM Press 

Full text available: ^ pdf(1.11 MB) Additional Information: full citation, abstract , references 

This paper describes an application of Byzantine Agreement [DoSt82a, DoSt82e, LyFF82] 
to distributed transaction comnnit. We replace the second phase of one of the commit 
algorithms of [MoLi83] with Byzantine Agreement, providing certain trade-offs and 
advantages at the time of commit and providing speed advantages at the time of recovery 
from failure. The present work differs from that presented in [DoSt82b] by increasing the 
scope (handling a general tree of processes, and multi-cluster transac ... 

9 Method for distributed transaction commit and recovery using Byzantine Agreement Q 
within clusters of processors 
C. Mohan, R. Strong, S. Finkelstein 

August 1983 Proceedings of the second annual ACM symposium on Principles of 

distributed computing 
Publisher: ACM Press 

Full text available: fiQDdf(939.80 KB) Additional Information: full citation , abstract , references, dtings, index 

terms 

This paper describes an application of Byzantine Agreement [DoSt82a, DoSt82c, LyFF82] 
to distributed transaction commit. We replace the second phase of one of the commit 
algorithms of [l^oLi83] with Byzantine Agreement, providing certain trade-offs and 
advantages at the time of commit and providing speed advantages at the time of recovery 
from failure. The present work differs from that presented in [DoSt82b] by increasing the 
scope (handling a general tree of processes, and multi-cluster tr ... 

10 Partition testing, stratified sampling, and cluster analysis Q 
Andy PodgurskI, Charles Yang 

December 1993 ACM SIGSOFT Software Engineering Notes , Proceedings of tiie 1st 
ACM SIGSOFT symposium on Foundations of software engineering 
SIGSOFT '93, Volume 18 Issue 5 
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Publisher: ACM Press 

Full text available: ^pdfn.35 MB) Additional Information: full citation , abstract, references , citings , index 
^ terms 

We present a new approach to reducing the manual labor required to estimate software 
reliability. It combines the ideas of partition testing methods with those of stratified 
sampling to reduce the sample size necessary to estimate reliability with a given degree of 
precision. Program executions are stratified by using automatic cluster analysis to group 
those with similar features. We describe the conditions under which stratification Is 
effective for estimating softw ... 



Fastpath Optimizations for Cluster Recovery in Shared-Disk Systems Q 

Randal Burns 

November 2004 Proceedings of the 2004 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society 

Full text available: ^Pdfn7670 KB) Additional Information: full citation , abstract 

We describe the design and implementation of a clustering service for a high-performance, 
shared-disk file system. The service provides failure detection and recovery, reliableend- 
to-end messaging, and a centralized and recoverable management interface. We 
implement novel optimizations in the voting protocol that resolves cluster membership. 
Optimizations allow clusters to form as quickly as possible without introducing livelock or 
requiring timeout parameters to be tuned carefully. Our treatmen ... 



''2 Manageability, availability and performance in Porcupine: a highly scalable, cluster- Q 
based mail service 

YasushI Saito, Brian N. Bershad, Henry M. Levy 
December 1999 ACM SIGOPS Operating Systems Review , Proceedings of the 

seventeenth ACI^ symposium on Operating systems principles SOSP 

'99, Volume 33 Issue 5 
Publisher: ACM Press 

Full text available* Spdf(162 MB) Additional Information: full citation, abstract, references , citings , index 
^ terms 

This paper describes the motivation, design, and performance of Porcupine, a scalable mail 
server. The goal of Porcupine is to provide a highly available and scalable electronic mail 
service using a large cluster of commodity PCs. We designed Porcupine to be easy to 
manage by emphasizing dynamic load balancing, automatic configuration, and graceful 
degradation in the presence of failures. Key to the system's manageability, availability, 
and performance is that sessions, data, and underlying serv ... 

13 A High Availability Clustering Solution Q 
Phil Lewis 

August 1999 Linux Journal 

Publisher: Specialized Systems Consultants, Inc. 

Full text available: [g| html(34.77 KB) Additional Information: full citation , abstract , index terms 

Mr. Lewis tells us how he designed and implemented a simple high-avallablllty solution for 
his company 



14 Quantifying and Improving the Availability of Hiah-Perfornnance Cluster-Based 

Internet Services 

Kiran Nagaraja, Neeraj Krishnan, Ricardo BianchinI, Richard P. Martin, Thu D. Nguyen 
November 2003 Proceedings of tiie 2003 ACM/IEEE conference on Supercomputing 

Publisher: IEEE Computer Society 

Full text available: ^ pdf(306.01 KB) Additional Information: full citation, abstract 
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Cluster-based servers can substantially increase performance when nodes cooperate to 
globally manage resources. However, In this paper we show that cooperation results in a 
substantial availability loss, in the absence of high-availability mechanisms. Specifically, 
we show that a sophisticated cluster-based Web server, which gains a factor of 3 In 
performance through cooperation, increases service unavailability by a factor of 10 over a 
non-cooperative version. We then show how to augment this W ... 

Fast cluster failover using virtual memory-mapped communication 
Yuanyuan Zhou, Peter M. Chen, Kal Li 

May 1999 Proceedings of the 13th international conference on Supercomputing 
Publisher: ACM Press 

Full text available: ^ pdf(1.45MB) Additional Information: full citation , references, citings , index terms 



''B A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications | 
Hong Tang, Aziz Gulbeden, JIngyu Zhou, William Strathearn, Tao Yang, Lingkun Chu 
November 2004 Proceedings of tlie 2004 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society 

Full text available: ^ pdf(330.26 KB) Additional Information: full citation , abstract 

Cluster-based storage systems are popular for data-Intensive applications and it Is 
desirable yet challenging to provide incremental expansion and high availability while 
achieving scalability and strong consistency. This paper presents the design and 
implementation of a self-organizing storage cluster called Sorrento, which targets data- 
intensive workload with highly parallel requests and low write-sharing patterns. Sorrento 
automatically adapts to storage node joins and departures, and the sys ... 

17 Cluster-based scalable network services | 
Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier 
October 1997 ACM SIGOPS Operating Systems Review , Proceedings of the sixteenth 
ACM symposium on Operating systems principles SOSP '97, Volume 3i issue 

5 

Publisher: ACM Press 

Full text available: ^ pdf(2.42 MB) Additional Information: full citation , references, citings, index terms 



^8 Cluster-cover: a theoretical framework for a class of VLSI-CAD optimization problems Q 
C.-J. Shi, J. A. Brzozowski 

January 1998 ACM Transactions on Design Automation of Electronic Systems 

(TODAES), Volume 3 Issue 1 
Publisher: ACM Press 

Full text available* ffi pdf(238 83 KB) A^^'*'^"®* Information: full citation , abstract , references, citings, index 
• : terms 

This article introduces a mathematical framework called cluster-cover. We show that this 
framework captures the combinatorial structure of a class of VLSI design optimization 
problems, including two-level logic minimization, constrained encoding, multilayer 
topological planar routing, application timing assignment for delay-fault testing, and 
minimization of monitoring logic for BIST enchancement. These apparently unrelated 
problems can all be cast into two metaproblems in our framework: fl ... 

Keywords: NP-completeness, cluster-cover, logic minimizalton, self-checking logic design, 
state assignment, topological routing 
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19 Cellular disco: resource management using virtual clusters on shared-memory 
multiprocessors 

^ Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, Mendel Rosenblum 

August 2000 ACM Transactions on Computer Systems (TOCS), Volume 18 issue 3 
Publisher: ACM Press 

Full text available: 1 ^ Ddf(287.05 KB) Additional Infomnation: full citation , abstract, references , dtiogs. index 
^ terms , review 

Despite the fact that large-scale shared-memory multiprocessors have been commercially 
available for several years, system software that fully utilizes all their features is still not 
available, mostly due to the complexity and cost of making the required changes to the 
operating system. A recently proposed approach, called Disco, substantially reduces this 
development cost by using a virtual machine monitor that leverages the existing operating 
system technology. In this paper we present a ... 

Keywords: fault containment, resource managment, scalable multiprocessors, virtual 
machines 



20 Cellular Disco: resource management using virtual clusters on shared-memory 
^ multiprocessors 

^ Kinshuk Govll, Dan Teodoslu, Yongqiang Huang, Mendel Rosenblum 

December 1999 ACi^ SIGOPS Operating Systems Review , Proceedings of the 

seventeentli ACM symposium on Operating systems principles SOSP 

'99, Volume 33 Issue 5 
Publisher: ACM Press 

Full text available* odfd 93 MB) Additional Information: full citation , abstract, references , citings , index 

Despite the fact that large-scale shared-memory multiprocessors have been commercially 
available for several years, system software that fully utilizes all their features is still not 
available, mostly due to the complexity and cost of making the required changes to the 
operating system. A recently proposed approach, called Disco, substantially reduces this 
development cost by using a virtual machine monitor that leverages the existing operating 
system technology. In this paper we present a syste ... 
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^ The costs and limits of availability for replicated services | 
Halfeng Yu, Amin Vahdat 

October 2001 ACM SIGOPS Operating Systems Review , Proceedings of the eighteenth 
ACM symposium on Operating systems principles SOSP '01, Volume 35 issue 

5 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: g pdf(1.46MB) 



As raw system and network performance continues to improve at exponential rates, the 
utility of many services is increasingly limited by availability rather than performance. A 
key approach to improving availability involves replicating the service across multiple, 
wide-area sites. However, replication introduces well-known tradeoffs between service 
consistency and availability. Thus, this paper explores the benefits of dynamically trading 
consistency for availability using a continuous const ... 

2 Improving cluster availability using workstation validation 
Tallver Heath, Richard P. Martin, Thu D. Nguyen 

June 2002 ACM SIGMETRICS Performance Evaluation Review, Proceedings of the 
2002 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '02, volume 30 issue i 
Publisher: ACM Press 

Full text available: gpdf(201.72 KB) Additional Information: full citation , abstract , references 

We demonstrate a framework for improving the availability of cluster based Internet 
services. Our approach models Internet services as a collection of interconnected 
components, each possessing well defined interfaces and failure semantics. Such a 
decomposition allows designers to engineer high availability based on an understanding of 
the interconnections and isolated fault behavior of each component, as opposed to ad-hoc 
methods. In this work, we focus on using the entire commodity workstation ... 



Analysis and implementation of software rejuvenation in cluster systems 
Kalyanaraman Vaidyanathan, Richard E. Harper, Steven W. Hunter, Kishor S. Trivedi 
June 2001 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
2001 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '01, volume 29 issue i 
Publisher: ACM Press 

Full text available: Q pdf(983.05 KB) Additional Information: full citation , abstract , references , citings 
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Several recent studies have reported the phenomenon of "software aging", one in which 
the state of a software system degrades with time. This may eventually lead to 
performance degradation of the software or crash/hang failure or both. "Software 
rejuvenation" is a pro-active technique aimed to prevent unexpected or unplanned 
outages due to aging. The basic idea Is to stop the running software, clean its Internal 
state and restart it. In this paper, we discuss software rejuvenation as applied to ... 

Survey of software tools for evaluating reliability, availability, and serviceability 
Allen M. Johnson, Miroslaw Maiek 

September 1988 ACM Computing Surveys (CSUR), volume 20 issue 4 
Publisher: ACM Press 

Full text available: «Ddf(3.79MB) Additional Information: full citation, abstract, references, citings, index 
^ terms 

In computer design, it is essentia! to know the effectiveness of different design options In 
improving performance and dependability. Various software tools have been created to 
evaluate these parameters, applying both analytic and simulation techniques, and this 
paper reviews those related primarily to reliability, availability, and serviceability. The 
purpose, type of models used, type of systems modeled, inputs, and outputs are given for 
each package. Examples of some of the key modeling ... 

Multiview access protocols for large-scale replication 

Xiangning Liu, Abdelsalam Helal, Weimin Du 

June 1998 ACM Transactions on Database Systems (TODS), volume 23 issue 2 
Publisher: ACM Press 

Full text available- I S pdf(365 98 KB) Additional Information: full citation , abstract , references , citin gs , index 
y^p£_v : terms , review 

The article proposes a scalable protocol for replication management In large-scale 
replicated systems. The protocol organizes sites and data replicas Into a tree-structured, 
hierarchical cluster architecture. The basic idea of the protocol is to accomplish the 
complex task of updating replicated data with a very large number of replicas by a set of 
related but independently committed transactions. Each transaction is responsible for 
updating replicas in exactly one cluster and Invoking add ... 

Keywords: data replication, large-scale systems, multiview access 



6 Session 3: Minimal replication cost for availability Q 
Haifeng Yu, Amin Vahdat 

July 2002 Proceedings of the twenty-first annual symposium on Principles of 
distributed computing 

Publisher: ACM Press 

Full text available: ^Ddfn.18 MB) Additional Information: full citation , abstract , references , citings 

Today, the utility of many replicated Internet services is limited by availability rather than 
raw performance. To better understand the effects of replica placement on availability, we 
propose the problem of minimal replication cost for availability. Let replication cost be the 
cost associated with replica deployment, dynamic replica creation and teardown at n 
candidate locations. Given client access patterns, replica failure patterns, network partition 
patterns, a required consis ... 

7 DNS: Availabilitv. usage, and deployment characteristics of the domain name system Q 
Jeffrey Pang, James Hendricks, Aditya Akella, Roberto De Prisco, Bruce Maggs, Srinivasan 
Seshan 

October 2004 Proceedings of the 4th ACM SIGCOMM conference on Internet 
measurement 
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Publisher: ACM Press 

Full text available: ^ pdf(856.34 KB) Additional Information: full citation , abstract , references , index terms 

The Domain Name System (DNS) is a critical part of the Internet's infrastructure, and is 
one of the few examples of a robust, highly-scalable, and operational distributed system. 
Although a few studies have been devoted to characterizing its properties, such as its 
workload and the stability of the top-level servers, many key components of DNS have not 
yet been examined. Based on large-scale measurements taken fromservers in a large 
content distribution network, we present a detailed study of ... 

Keywords: DNS, availability, federated 



® A characterization of the simple failure-biasing method for sinnulations of highly 
reliable Markovian Systems 

Marvin K, Nakayama 

January 1994 ACM Transactions on Modeling and Computer Simulation (TOMACS), 

Volume 4 Issue 1 
Publisher: ACM Press 

Full text available' fiOj pdf(2.25 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

Simple failure biasing is an Importance-sampling technique used to reduce the variance of 
estimates of performance measures and their gradients in simulations of highly reliable 
Markovian systems. Although simple failure biasing yields bounded relative error for the 
performance measure estimate when the system is balanced, it may not provide bounded 
relative error when the system is unbalanced. In this article, we provide a characterization 
of when the simple failure-biasing meth ... 

Keywords: balanced failure biasing, gradient estimation, highly reliable systems, 
importance sampling, likelihood ratios, simple failure biasing 



Bounding availability of repairable computer systems 
R. R. Muntz, E. de Souza e Sllva, A. Goyal 

April 1989 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1989 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '89, Volume 17 issue i 
Publisher: ACM Press 

Full text available- B Ddfd 15 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

Markov nnodels are widely used for the analysis of availability of connputer/comnnunlcation 
systems. Realistic models often involve state space cardinalities that are so large that it is 
impractical to generate the transition rate matrix let alone solve for availability measures. 
Various state space reduction methods have been developed, particularly for transient 
analysis. In this paper we present an approximation technique for determining steady 
state availability. Of particular interest is ... 

10 Resource Management for Rapid Application Turnaround on Enterprise Desktop 
Grids 

Derrick Kondo, Andrew A. Chien, Henri Casanova 

November 2004 Proceedings of the 2004 ACM/IEEE conference on Supercomputing 

Publisher: IEEE Computer Society 

Full text available:^ pdfd 54.88 KB) Additional Information: full citation , abstract 

Desktop grids are popular platforms for high throughput applications, but due their 
Inherent resource volatility It Is difficult to exploit them for applications that require rapid 
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turnaround. Efficient desktop grid execution of short-lived applications is an attractive 
proposition and we claim that it is achievable via intelligent resource selection. We propose 
three general techniques for resource selection: resource prioritization, resource exclusion, 
and task duplication. We use these techni ... 



Efficient exploration of availability models guided by failure distances 
Juan A. Carrasco, Javier Escriba, Angel Calderon 

May 1996 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1996 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '96, volume 24 issue i 
Publisher: ACM Press 

Full text available: Q pdf(1.08 MB) Additional Information: full citation , abstract , references , index terms 

Recently, a nnethod to bound the steady-state availability using the failure distance 
concept has been proposed. In this paper we refine that method by Introducing state 
space exploration techniques. In the methods proposed here, the state space is 
incrementally generated based on the contributions to the steady-state availability band of 
the states in the frontier of the currently generated state space. Several state space 
exploration algorithms are evaluated in terms of bounds quality and memor ... 




12 Research papers: storage, indexing, and system architecture: Guaranteeing 
correctness and availability in P2P range indices 

Prakash Linga, Adina Crainlceanu, Johannes Gehrke, Jayavel Shanmugasudaram 
June 2005 Proceedings of the 2005 ACM SIGMOD international conference on 

Management of data 
Publisher: ACM Press 

Full text available: ^ pdf(430.28 KB) Additional Information: full citation , abstract , references 

New and emerging P2P applications require sophisticated range query capability and also 
have strict requirements on query correctness, system availability and item availability. 
While there has been recent work on developing new P2P range indices, none of these 
indices guarantee correctness and availability. In this paper, we develop new techniques 
that can provably guarantee the correctness and availability of P2P range Indices. We 
develop our techniques in the context of a general P2P indexing ... 

13 Cellular Disco: resource management using virtual clusters on shared-memory 
^ multiprocessors 

^ Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, Mendel Rosenblum 

December 1999 ACM SIGOPS Operating Systems Review , Proceedings of tiie 

seventeenth ACM symposium on Operating systems principles SOSP 

'99, Volume 33 Issue 5 
Publisher: ACM Press 

Full text available: fg| pdf(1.93 MB) Additional Information: full citation , abstract, references , citings, index 
^ terms 

Despite the fact that large-scale shared-memory multiprocessors have been commercially 
available for several years, system software that fully utilizes all their features is still not 
available, mostly due to the complexity and cost of making the required changes to the 
operating system. A recently proposed approach, called Disco, substantially reduces this 
development cost by using a virtual machine monitor that leverages the existing operating 
system technology. In this paper we present a syste ... 



14 Cellular disco: resource management using virtual clusters on shared-memory 

multiprocessors 

Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, Mendel Rosenblum 
August 2000 ACM Transactions on Computer Systems (TOCS), volume 18 issue 3 
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Publisher: ACM Press 

Full text available: fi^ pdf(287.05 KB) Additional Information: full citation , abstract , references , citings , index 
^ ' terms , review 

Despite the fact that large-scale shared-memory multiprocessors have been comhriercially 
available for several years, system software that fully utilizes all their features is still not 
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